3D cache-oblivious multi-scale traversals of meshes using 8-reptile polyhedra
نویسنده
چکیده
In the field of numerical simulation, a multi-scale grid and its traversal can have a huge impact on the cache-efficiency of the calculations performed. We present the Haverkort element traversal and refinement scheme which is based on the bisection of 8-reptile tetrahedra. We developed a stack-assignment scheme that allows the Haverkort traversal to use stacks for storing the input data, the output data, and the temporary data of vertices and elements. Our parallel plane approach to assign stacks to vertices gave solution that requires at most 8 stacks to store temporary vertex data for uniform and multi-scale refined grids independent of the refinement level. Furthermore 8 is also the lower bound for the number of stacks for our parallel plane approach. Additionally we show that a general lower bound exists of 5 temporary stacks for storing the vertex data during a traversal. The combination of the Haverkort traversal with our Constant Stack algorithm is cache-oblivious, suitable for multi-level cache hierarchies and maintains its performance for multi-level refined grids and allows for fast and efficient space-filling-curve-based (re)partitioning. The Constant Stack algorithm has a time complexity of O(1) for stackassignment and -access and can achieve a cache-miss ratio of less than 5.2% .For all of these reasons we expect that a constant-number-of-stacks solution can compete with and outperform numerical simulations using cache-optimization techniques based on loop blocking when running on CPUs or GPUs. We apply the approach found for obtaining a constant-number-of-stacks solution for the Haverkort element traversal and a refinement scheme was applied to two other suitable 8-reptile polyhedra. Traversal and refinement schemes are given for an 8-reptile bisected cube and an 8-reptile bisected triangular prism needing 9 and 7 stacks respectively. this page is intentionally left blank
منابع مشابه
Scanning and Traversing: Maintaining Data for Traversals in a Memory Hierarchy
We study the problem of maintaining a dynamic ordered set subject to insertions, deletions, and traversals of k consecutive elements. This problem is trivially solved on a RAM and on a simple two-level memory hierarchy. We explore this traversal problem on more realistic memory models: the cache-oblivious model, which applies to unknown and multi-level memory hierarchies, and sequential-access ...
متن کاملA Fast Cache Oblivious Mesh Layout with Theoretical Guarantees
One important bottleneck when visualizing large data sets is the data transfer between processor and memory. Cache-aware (CA) and cacheoblivious (CO) algorithms take into consideration the memory hierarchy to design cache efficient algorithms. CO approaches have the advantage to adapt to unknown and varying memory hierarchies. Recent CA and CO algorithms developed for 3D mesh layouts significan...
متن کاملCache-Oblivious Traversals of an Array’s Pairs
Cache-obliviousness is a concept first introduced by Frigo et al. in [1]. We follow their model and develop a cache-oblivious algorithm for traversing all pairs of elements in a one-dimensional array and prove that it is optimally cache-efficient. Though the the traversal is recursive in nature, we demonstrate how to implement the algorithm nonrecursively, and we give experimental results.
متن کاملOptimal Oblivious Routing On D-Dimensional Meshes
In this work we consider deterministic oblivious k-k routing algorithms with buffer size O(k). Our main focus lie is the design of algorithms for dimensional n n meshes, d> 1. For these networks we present asymptotically optimal O(kpnd) step oblivious k-k routing algorithms for all k and d > 1.
متن کاملCache-Efficient Parallel Isosurface Extraction for Shared Cache Multicores
This paper proposes to revisit isosurface extraction algorithms taking into consideration two specific aspects of recent multicore architectures: their intrinsic parallelism associated with the presence of multiple computing cores and their cache hierarchy that often includes private caches as well as caches shared between all cores. Taking advantage of these shared caches require adapting the ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016